Re: [GENERAL] Geometric, getting x and y co-ordinates GOING MAD!!!!! - Mailing list pgsql-general
From | Stuart Rison |
---|---|
Subject | Re: [GENERAL] Geometric, getting x and y co-ordinates GOING MAD!!!!! |
Date | |
Msg-id | v04020a0ab3829c147bf5@[128.40.242.190] Whole thread Raw |
In response to | Re: [GENERAL] Geometric, getting x and y co-ordinates GOING MAD!!!!! (selkovjr@mcs.anl.gov) |
List | pgsql-general |
At 7:33 pm -0400 7/6/99, selkovjr@mcs.anl.gov wrote: >On Fri, 4 Jun 1999, Stuart Rison wrote: > >[snipped -- a float return type drama with a happy end] > > >That wasn't so much about C as it was about how postgres handles >return values. Here's the relevant doc page: > >http://www.postgresql.org/docs/programmer/xfunc414.htm > >As far as 'bluffing', that is what perl was intentionally built for. C >is remarkably simple, but it assumes you know not only WHAT it does >but also HOW. It's easy to get shot if you forget about the how part >for a moment. That's very true, I certainly seem to do most of my bluffing in Perl. Thanks for the doc ref, it makes -a bit- more sense now. >> This all steemed from me trying to write a standard deviation/variance set >> of aggregate function. This was just because a point seemed -at the time- >> like quite a 'cute' way of storing to floats in one base type eliminating >> the need for arrays (the two floats being the sum of elements and the sum >> of the elements squared stored as the x and y coords of a point >> respectively). > >Although point type is a cute way of storing float pairs, it may >become extremely inefficient in case of mega-tables. What are you >going to do with your points? Do you build indices on them? How are the >points distributed in 2-D? The type of distribution and order of >points affect the performance of R-trees. I think I may have confused you, did you think I was storing a table of points as a method of storing a value with a confidence interval (e.g. 6.3+/-0.37) or perhaps matching x and y values for linear regression type stats? Saddly, my aggregate functions are far more trivial then that!!! The point (and there is only one) is used literally as a way of storing two floats by the 'sfunc1' of my stddev and variance aggregates. It's a very crude aggregate more to teach myself the basics of defining a new aggregate then to be used extensively based on a posting by Jan Weick a long time ago which used pg/tcl. The idea is that you need to keep track of a minimum of three values to get an accurate calculation of variance (and by extension standard deviation) in a single pass algorithm (which I would argue is what an aggregate is): - the sum of the elements in a series - the sum of the square of the elements in a series - the number of elements in a series 'sfunc2' could easily cope with the number of elements in a series but that left sfunc1 to store two floats and I couldn't find a way of getting sfunc1 to cope with arrays so I just used a point instead (Jan used a Tcl list). >Also, if you were looking to store the (mean, SD) values in one >column, you would be better off with the whole new type. If your >science/confession would allow you to represent random distributions >as intervals, such as (mean - SD/2, mean + SD/2), the intervals could >be stored as a 1-D geometric type and indexed with R-tree, with some >caution. If that makes sense, welcome to my segment type: > >http://wit.mcs.anl.gov/~selkovjr/seg-type.tgz Yes, it makes sense and I had a look your segment type work. Although I don't have a need for it yet, it looks wery impressive (big sigh... what a long and steep learning 'C' curve ahead of me... can I write all me functions in Perl and get those to be linked dynimically ;) ) >It already has some provision for the (mean, SD) syntax, but that >needs debugging. It works great with 'lower .. upper' syntax, where >either 'lower' or 'upper' can be omitted. Besides, it is a variable >precision type: your query returns exactly as many significant digits >as you have inserted. (I couldn't stand frustration it gives you when it >returns 1.2000000 for the value you stored as 1.20. Even 1.20 and 1.2 >make a huge difference when you deal with measurements) > >--Gene As a final suggestion for a TO-DO, should basic statistical function (STDDEV, VARIANCE and perhaps MODAL) be added to the standard aggregates set? Best regards, Stuart. +-------------------------+--------------------------------------+ | Stuart Rison | Ludwig Institute for Cancer Research | +-------------------------+ 91 Riding House Street | | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. | | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk | +-------------------------+--------------------------------------+
pgsql-general by date: